In the article Fitting Percentage of Body Fat to Simple Body Measurements, Johnson (1996) uses the data at http://jse.amstat.org/datasets/fat.dat.txt provided to him by Dr. A. Garth Fischer in a personal communication on October 5, 1994, as a multiple linear regression activity with his students. A subset of the variables at http://jse.amstat.org/datasets/fat.dat.txt is available in the R package mfp by (R-mfp?) and the data set is used frequently in the text Statistical Regression and Classification by Matloff (2017).

The purpose of this activity is to have the reader critically question, evaluate, and clean the original data provided at http://jse.amstat.org/datasets/fat.dat.txt. The guided activities will reinforce reading data into R using the fread() function from the data.table package written by Dowle and Srinivasan (2023), creating graphs with both packages ggplot2 and plotly written by Wickham, Chang, et al. (2023) and Sievert et al. (2022), respectively, and creating new variables with functions such as mutate() from the dplyr package written by Wickham, François, et al. (2023).


Type complete sentences to answer all questions inside the answer tags provided in the R Markdown document. Round all numeric answers you report inside the answer tags to four decimal places. Use inline R code to report numeric answers inside the answer tags (i.e. do not hard code your numeric answers).

The article by Johnson (1996) defines body-fat determined with the brozek and siri methods as well as fat free weight using Equations (1), (2), and (3), respectively.

\[\begin{equation} \text{bodyfatBrozek} = \frac{457}{\text{density}} - 414.2 \tag{1} \end{equation}\]

\[\begin{equation} \text{bodyfatSiri} = \frac{495}{\text{density}} - 450 \tag{2} \end{equation}\]

\[\begin{equation} \text{FatFreeWeight} = \left(1 -\frac{\text{brozek}}{100}\times \text{weight_lbs}\right) \tag{3} \end{equation}\] Body Mass Index (BMI) is defined in Equation (4).

\[\begin{equation} \text{BMI} = \frac{\text{kg}}{\text{m}^2} \tag{4} \end{equation}\]

Please use the following conversion factors with this project: 0.453592 kilos per pound and 2.54 centimeters per inch.

For example, a male who weighs 185 pounds and is 71 inches tall has a weight of 83.91 kg and a height of 1.8034 m.

\[\begin{equation} 185 \text{ pounds} \times \frac{0.453592 \text{ kg}}{\text{pounds}} = 83.91452 \text{ kg} \end{equation}\]

\[\begin{equation} 71 \text{ inches} \times \frac{2.54 \text{ cm}}{\text{inches}}\times \frac{1 \text {m}}{100 \text{ cm}} = 1.8034 \text{ m} \end{equation}\]


To read tabular data from the internet, one might use the read.table() function or the fread() function from the data.table package. The general structure to read data from the internet is to provide the URL in the file or input arguments of the read.table() or fread() functions, respectively.

# Example structure
DF1 <- read.table(file = "http://some.url.com")
# load data.table package
library(data.table)
DF2 <- fread(input = "http://some.url.com")

  1. Read the original data from http://jse.amstat.org/datasets/fat.dat.txt into your current R session. Specifically, start by using the fread() function from the data.table package to read the data from http://jse.amstat.org/datasets/fat.dat.txt into an object named bodyfat. The data dictionary for http://jse.amstat.org/datasets/fat.dat.txt is available at http://jse.amstat.org/datasets/fat.txt.
    Pass the following vector of names to the col.names argument of fread(): c("case", "brozek", "siri", "density", "age", "weight_lbs", "height_in", "bmi", "fat_free_weight", "neck_cm", "chest_cm", "abdomen_cm", "hip_cm", "thigh_cm", "knee_cm", "ankle_cm", "biceps_cm", "forearm_cm", "wrist_cm").
# Type your code and comments inside the code chunk
# Obtaining the original data 
library(data.table)
url <- "http://jse.amstat.org/datasets/fat.dat.txt"

bodyfat <- fread(input = url, col.names = c("case", "brozek", "siri", "density", "age", "weight_lbs", "height_in", "bmi", "fat_free_weight", "neck_cm", "chest_cm", "abdomen_cm", "hip_cm", "thigh_cm", "knee_cm", "ankle_cm", "biceps_cm", "forearm_cm", "wrist_cm"))

if(!dir.exists("./data/")){
  dir.create("./data/")
}
if(!file.exists("./data/fat.dat.txt")){
  download.file(url, destfile = "./data/fat.dat.txt")
}

bodyfat <- fread(input = "./data/fat.dat.txt",
                 col.names = c("case", "brozek", "siri", "density", "age", "weight_lbs",
                                            "height_in", "bmi", "fat_free_weight", "neck_cm",
                                            "chest_cm", "abdomen_cm", "hip_cm", "thigh_cm",
                                            "knee_cm", "ankle_cm", "biceps_cm", "forearm_cm",
                                            "wrist_cm"))

  1. Use the read.table() function to read the data from http://jse.amstat.org/datasets/fat.dat.txt into an object named bodyfat2. Pass the same vector created in Problem 1 to the col.names argument of read.table().
# Type your code and comments inside the code chunk
# Obtaining the original data

bodyfat2 <-read.table(file = url,
                      col.names = c("case", "brozek", "siri", "density", "age", "weight_lbs",
                                            "height_in", "bmi", "fat_free_weight", "neck_cm",
                                            "chest_cm", "abdomen_cm", "hip_cm", "thigh_cm",
                                            "knee_cm", "ankle_cm", "biceps_cm", "forearm_cm",
                                            "wrist_cm"))

#read from local folder (read.table)

bodyfat2 <- read.table(file = "./data/fat.dat.txt",
                       col.names = c("case", "brozek", "siri", "density", "age", "weight_lbs", "height_in", "bmi", "fat_free_weight", "neck_cm", "chest_cm", "abdomen_cm", "hip_cm", "thigh_cm", "knee_cm", "ankle_cm", "biceps_cm", "forearm_cm", "wrist_cm"))

The basic structure of a function in R is

function_name <- function(argument1, argument2, ...){
  expression
}

The function curve() draws a curve corresponding to a function of the interval [from, to]. For example, to draw the function \(y = x^2\) from -3 to 3 use the following code to obtain Figure 1.

quad <- function(x){x^2}
curve(quad, from = -3, to = 3)
Using the function `curve()` to draw a quadratic function

Figure 1: Using the function curve() to draw a quadratic function

To superimpose a function on an open graphics device with curve(), use the argument add = TRUE. For example, to graph \(y = x^2\) with a blue line and \(y = 4 + \sin(8x)\) with a red line from \(-\pi\) to \(\pi\), use the following code:

quad <- function(x){x^2}

curve(quad, from = -pi, to = pi, col = "blue", ylab = "f(x)")

nsin <- function(x){4 + sin(8*x)}

curve(nsin, add = TRUE, col = "red")


  1. Graph the relationship between brozek and density given in Equation (1) with a blue line as well as the relationship between siri and density given in Equation (2) with a red line. First, define functions fb and fs for Equations (1) and (2), respectively. Use the function curve() to draw Equations (1) and (2) on the same graphics device for density values between 1.0 \(\text{gm/cm}^3\) and 1.10 \(\text{gm/cm}^3\).
# Type your code and comments inside the code chunk
# Functions fb and fs 

fb <- function(density){(457/density) - 414.2}
fs <- function(density){(495/density) - 450}

curve(fb, from = 1.0, to = 1.10, n = 200,
      xlab = expression(paste("density in", gm/cm^3)),
      ylab = " body fat", col = "blue", ylim = c(0,50))

curve(fs, from = 1.0, to = 1.10, n = 200)


Code to construct the requested graph using ggplot2 code is given below and the result is visible in Figure 2.

library(ggplot2)
p <- ggplot(data = data.frame(x = c(1, 1.1), y = c(0, 50)), aes(x = x, y = y)) 
p + stat_function(fun = fb, color = "blue") +
    stat_function(fun = fs, color = "red") +
    labs(x = expression(paste("density in ", gm/cm^3)), y = "bodyfat") +
    theme_bw()
Realtionship between bodyfat and density according to Brozek and Siri definitions

Figure 2: Realtionship between bodyfat and density according to Brozek and Siri definitions

  1. How would you characterize the theoretical relationship given in Equations (1) and (2)?

The relationship between body-fat and density for the Equations ((1) and (2)) is a linear relationship and negative. The body fat linearly decreases as the density increases.


Since the relationship between body fat and density for both Equations (1) and (2) is linear, one way to check for unusual observations is to plot the actual values for brozek versus density or siri versus density using the data stored in bodyfat or bodyfat2. Points that do not fall along a straight line should be scrutinized for possible data entry errors or other possible explanations.

The plotly package allows the user to create interactive graphs or to convert ggplot2 graphs to plotly graphs using the ggplotly() function. Given a data frame say DF with variables X and Y, the following template code will create an interactive plotly scatterplot.

library(plotly)
library(ggplot2)
# Create ggplot scatterplot named p
p <- ggplot(data = DF, aes(x = X, y = Y)) + 
  geom_point()
# Convert ggplot scatterplot to plotly with ggplotly()
p2 <- ggplotly(p)
# Display graph
p2

Consider an interactive scatterplot showing the distance required to stop for vehicles traveling at different speeds using the cars data frame.

library(plotly)
library(ggplot2)
# Create ggplot scatterplot named p
p <- ggplot(data = cars, aes(x = speed, y = dist)) + 
  geom_point()
# Convert ggplot scatterplot to plotly with ggplotly()
p2 <- ggplotly(p)
# Display graph
p2

  1. Create interactive plotly scatterplots of siri versus density with case mapped to color, weight_lbs versus height_in with case mapped to color, and ankle_cm versus weight_lbs with case mapped to color to help identify potential outliers. Identify the case numbers for the outliers from your three plots.
# Type your code and comments inside the code chunk
# Creating interactive scatterplot of siri versus density

# Change code below to your plot
sdplot <- ggplot(data = bodyfat, aes(x = density, y = siri, color = case)) +
  geom_point() +
  theme_bw()
sdplot
Plot of `siri` versus `density`

Figure 3: Plot of siri versus density

sdp <- ggplotly(sdplot)
sdp

Figure 3: Plot of siri versus density

# Type your code and comments inside the code chunk
# Creating interactive scatterplot of weight_lbs versus height_in

# Change code below to your plot
whplot <- ggplot(data = bodyfat, aes(x = height_in, y = weight_lbs, color = case)) +
  geom_point() +
  theme_bw()
whplot
Plot of `weight_lbs` versus `height_in`

Figure 4: Plot of weight_lbs versus height_in

whp <- ggplotly(whplot)

whp

Figure 4: Plot of weight_lbs versus height_in

# Type your code and comments inside the code chunk
# Creating interactive scatterplot of ankle_cm versus weight_lbs

# Change code below to your plot

awplot <- ggplot(data = bodyfat, aes(x = weight_lbs, y = ankle_cm, color = case)) +
  geom_point() + 
  theme_bw()
awplot
Interactive scatterplot of `ankle_cm` versus `weight_lbs`

Figure 5: Interactive scatterplot of ankle_cm versus weight_lbs

awp <- ggplotly(awplot)

awp

Figure 5: Interactive scatterplot of ankle_cm versus weight_lbs

There is a total of 9 outliers in the three plots combined. The case outliers in the siri versus density plot are 216, 96, 76, 48, 182. The case outliers in the weight_lbs versus height_in are 42 and 39. The case outliers in the ankle_cm versus weight_lbs are 86, 31, and 39.


  1. Create a subset of the bodyfat data frame named BF consisting only of the cases identified as outliers in Problem 5. Use equation (2) to create a new variable in BF named density_C (computed density) based on the reported siri values. Use equation (4) to reverse engineer the computed height in inches based on the values in weight_lbs and bmi using the conversion factors given at the start of the lab. Store the computed heights in inches in a variable named height_in_C. Use the verb mutate from the dplyr package to create both density_C and height_in_C. Show the values of the selected outliers for columns case, density_C, height_in, height_in_C, and ankle_cm. What do you notice about the density and density_C values in BF for the scatterplot you created in Figure 3? What do you notice about the height_in values in BF for the scatterplot you created in Figure 4? What do you notice about the ankle_cm values in BF for the scatterplot you created in Figure 5?
# Type your code and comments inside the code chunk
# Isolating points of interest
library(dplyr)

BF <- bodyfat %>%
   filter(case == 216 |
           case == 76 |
           case == 96 |
           case == 48 |
           case == 182 |
           case == 39 |
           case == 42 |
           case == 86 |
           case == 31 |
           case == 39)
  
BF <- BF %>%
  mutate(density_C = (495/(siri + 450)),
         weight_KG = (weight_lbs * 0.453592),
         height_in_C = (sqrt(weight_KG/bmi)*(100/2.54))) %>%
  select('case', 'density_C', 'density', 'height_in', 'height_in_C', 'ankle_cm') 
BF
   case density_C density height_in height_in_C ankle_cm
1:   31 1.0716605  1.0716     73.75    73.63405     33.9
2:   39 1.0201979  1.0202     72.25    72.25827     29.6
3:   42 1.0250569  1.0250     29.50    69.42890     23.7
4:   48 1.0864794  1.0665     71.25    71.19157     21.9
5:   76 1.0565635  1.0666     67.50    67.46501     21.4
6:   86 1.0386068  1.0386     67.50    67.20020     33.7
7:   96 1.0590501  1.0991     77.75    77.76549     23.2
8:  182 1.1000000  1.1089     68.00    67.84516     20.2
9:  216 0.9949749  0.9950     64.00    63.99221     23.6

The calculated densities values for the BF plot are slightly higher than the provided density values. The difference is likely due to rounding.


  1. Change the reported density values of bodyfat based on your answers from Problem 6. Change the height_in value for case 42 to the value you reverse engineered in Problem 6 rounding to the nearest half inch.
# Type your code and comments inside the code chunk
# Updating computed bodyfat values and bmi measurements
# Replacing identified typos of density
bodyfat$density[bodyfat$case == "96"] <- 1.0590501 
bodyfat$density[bodyfat$case == "182"] <- 1.1000000
bodyfat$density[bodyfat$case == "76"] <- 1.0565635  
bodyfat$density[bodyfat$case == "216"] <- 0.9949749 
bodyfat$density[bodyfat$case == "48"] <- 1.0864794 

# Replacing identified typos in height_in

bodyfat$height_in[bodyfat$case == "42"] <- 69.40

# Replacing identified typos in ankle_cm
 
bodyfat$ankle_cm[bodyfat$case == "31"] <- 23.9
bodyfat$ankle_cm[bodyfat$case == "86"] <- 23.7

  1. Create variables named siri_C, brozek_C, and bmi_C, that compute body-fat values rounded to one decimal place according to equations (2), (1), and (4), respectively. Replace the values in siri and brozek with the computed values in siri_C and brozek_C for case 182.
# Type your code and comments inside the code chunk
  
bmi_C <- bodyfat$weight_lbs/((bodyfat$height_in*0.0254)^2)
  
num8 <- bodyfat %>%
  filter(case == 182) %>%
  mutate(siri_C = (495/density)-450,
         brozek_C = (457/density)-414.2) %>%
  select(case, siri_C, brozek_C)
round(num8, 1)
   case siri_C brozek_C
1:  182      0      1.3
bodyfat$siri[bodyfat$case == "182"] <- 0
bodyfat$brozek[bodyfat$case == "182"] <- 1.3

  1. Consider differences between brozek and brozek_C as well as differences between siri and siri_C greater than 0.11 to be some type of data entry problem. Use the verbs filter() and select() to show the values for case, siri, siri_C, brozek, brozek_C, and density that are considered data entry problems. If both the computed value for siri and brozek differ from the reported values of siri and brozek, it is likely a data entry problem with the density value. If one of either siri or brozek agrees with its computed value, the one that does not agree with the computed value is likely a data entry problem. Which cases seem to have data entry problems and for what reasons?
# Type your code and comments inside the code chunk
# Identifying bodyfat typos for brozek and siri

bodyfat <- bodyfat %>%
  mutate(siri_C = round(495 / density - 450, 1),
         brozek_C = round(457 / density- 414.2, 1),
         siri_diff = abs(siri_C - siri),
         brozek_diff = abs(brozek_C- brozek))

bodyfat_diff <- bodyfat %>%
  filter(siri_diff > 0.11 | brozek_diff > 0.11) %>%
  select('case', 'siri', 'siri_C', 'brozek', 'brozek_C', 'density', 'siri_diff', 'brozek_diff')

bodyfat_diff
    case siri siri_C brozek brozek_C density siri_diff brozek_diff
 1:    6 20.9   21.3   20.6     21.0  1.0502       0.4         0.4
 2:   11  7.1    7.1    7.5      7.8  1.0830       0.0         0.3
 3:   33 11.8   11.8   13.4     12.1  1.0719       0.0         1.3
 4:   49 13.6   13.6   13.4     13.8  1.0678       0.0         0.4
 5:   98 11.3   11.3   11.1     11.7  1.0730       0.0         0.6
 6:  152 19.6   19.6   19.1     19.3  1.0542       0.0         0.2
 7:  169 34.3   36.2   34.7     34.7  1.0180       1.9         0.0
 8:  200 23.6   23.1   22.6     22.6  1.0462       0.5         0.0
 9:  235 25.8   25.8   25.5     25.1  1.0403       0.0         0.4
10:  237 24.8   24.9   24.0     24.2  1.0424       0.1         0.2

For density, the cases 6 and 237 have issues with data entry.

For siri, the cases 6, 169, and 200 have issues with data entry.

For brozek, the cases 11, 33, 49, 98, 152, 169, 235, and 237 have issues with data entry.


  1. Fix any data entry errors you identified in Problem 9 in the bodyfat data frame. Make sure to update the siri_C, brozek_C, and bmi_C variables in bodyfat using your corrected values from Problem 9. Write code to verify that there are no differences between brozek and brozek_C or any differences between siri and siri_C greater than 0.11.
# Type your code and comments inside the code chunk
# Replacing bodyfat typos for brozek and siri

bodyfat$siri[bodyfat$case == 6] <- 21.338793
bodyfat$siri[bodyfat$case == 169] <- 36.247544  
bodyfat$siri[bodyfat$case == 200] <- 23.140891

bodyfat$brozek[bodyfat$case == 6] <- 20.955209
bodyfat$brozek[bodyfat$case == 11] <- 7.775993
bodyfat$brozek[bodyfat$case == 33] <- 12.145741
bodyfat$brozek[bodyfat$case == 49] <- 13.782768
bodyfat$brozek[bodyfat$case == 98] <- 11.708667
bodyfat$brozek[bodyfat$case == 152] <- 19.304079
bodyfat$brozek[bodyfat$case == 169] <- 34.719450
bodyfat$brozek[bodyfat$case == 235] <- 25.096357
bodyfat$brozek[bodyfat$case == 237] <- 24.211358

# Updating siri_C, brozek_C, and bmi_C

bodyfat <- bodyfat %>%
  mutate(siri_C = round(495 / density - 450, 1),
         brozek_C = round(457 / density- 414.2, 1),
         siri_diff = abs(siri_C - siri),
         brozek_diff = abs(brozek_C- brozek))

# Checking for typos according to given criteria

bodyfat_diff2 <- subset(bodyfat) %>%
               filter(siri_diff > 0.11 | brozek_diff > 0.11) %>%
  select('case', 'siri', 'siri_C', 'brozek', 'brozek_C', 'density', 'siri_diff', 'brozek_diff')

bodyfat_diff2
Empty data.table (0 rows and 8 cols): case,siri,siri_C,brozek,brozek_C,density...


  1. Create interactive plotly scatterplots of siri_C versus density with case mapped to color and brozek_C versus density with case mapped to color using the modified bodyfat data frame. Superimpose equations (2) and (1) over their corresponding scatterplots. Comment on the scatterplots.
# interactive plotly scatterplot siri_C vs. density

# Change code below to your plot
newplot <- ggplot(data = bodyfat, aes(x = density, y = siri_C, color = case)) + 
  geom_point() + 
  theme_bw()
ggplotly(newplot)
# interactive plotly scatterplot brozek_C vs. density

# Change code below to your plot
newplot2 <- ggplot(data = bodyfat, aes(x = density, y = brozek_C, color = case)) + 
  geom_point() + 
  theme_bw()
ggplotly(newplot2)

After modifying the bodyfat dataset, there are no outliers in either plot.


  1. How many of the values do you think are potentially rounding errors? Explain your reasoning and show the code you used to identify the errors.
# Type your code and comments inside the code chunk
# Number of rounding discrepancies for siri

sum(bodyfat$siri_C != bodyfat$siri)
[1] 32
# Number of rounding discrepancies for brozek

sum(bodyfat$brozek_C != bodyfat$brozek)
[1] 46
# Number of rounding discrepancies for bmi

sum(bodyfat$bmi_C != bodyfat$bmi)
[1] 0

There are 32 rounding errors for siri, 46 rounding errors for brozek, and 0 for bmi.


  1. What additional variables might you explore to check the quality of the reported data?

The computed values for brozek and siri can be explored to check the quality of the reported data. The relationship is explored through a scatterplot.

# Change code below to your plot
newplot3 <- ggplot(data = bodyfat, aes(x = brozek_C, y = siri_C, color = case)) + 
  geom_point() +
  theme_bw()
ggplotly(newplot3)

  1. Create an object named bodyfatClean that excludes the variables brozek, siri, and bmi from the bodyfat data frame. Use the function write.csv() or write_csv() to store the bodyfatClean object as a CSV file.
# writing bodyfat to bodyfatClean.csv

bodyfatClean <- bodyfat[, -c("brozek", "siri", "bmi")]
bodyfatClean
     case density age weight_lbs height_in fat_free_weight neck_cm chest_cm
  1:    1  1.0708  23     154.25     67.75           134.9    36.2     93.1
  2:    2  1.0853  22     173.25     72.25           161.3    38.5     93.6
  3:    3  1.0414  22     154.00     66.25           116.0    34.0     95.8
  4:    4  1.0751  26     184.75     72.25           164.7    37.4    101.8
  5:    5  1.0340  24     184.25     71.25           133.1    34.4     97.3
 ---                                                                       
248:  248  1.0736  70     134.25     67.00           118.9    34.9     89.2
249:  249  1.0236  72     201.00     69.75           136.1    40.9    108.5
250:  250  1.0328  72     186.75     66.00           133.9    38.9    111.1
251:  251  1.0399  72     190.75     70.50           142.6    38.9    108.3
252:  252  1.0271  74     207.50     70.00           143.7    40.8    112.4
     abdomen_cm hip_cm thigh_cm knee_cm ankle_cm biceps_cm forearm_cm wrist_cm
  1:       85.2   94.5     59.0    37.3     21.9      32.0       27.4     17.1
  2:       83.0   98.7     58.7    37.3     23.4      30.5       28.9     18.2
  3:       87.9   99.2     59.6    38.9     24.0      28.8       25.2     16.6
  4:       86.4  101.2     60.1    37.3     22.8      32.4       29.4     18.2
  5:      100.0  101.9     63.2    42.2     24.0      32.2       27.7     17.7
 ---                                                                          
248:       83.6   88.8     49.6    34.8     21.5      25.6       25.7     18.5
249:      105.0  104.5     59.6    40.8     23.2      35.2       28.6     20.1
250:      111.5  101.7     60.3    37.3     21.5      31.3       27.2     18.0
251:      101.3   97.8     56.0    41.6     22.7      30.5       29.4     19.8
252:      108.5  107.1     59.3    42.2     24.6      33.7       30.0     20.9
     siri_C brozek_C siri_diff brozek_diff
  1:   12.3     12.6       0.0           0
  2:    6.1      6.9       0.0           0
  3:   25.3     24.6       0.0           0
  4:   10.4     10.9       0.0           0
  5:   28.7     27.8       0.0           0
 ---                                      
248:   11.1     11.5       0.1           0
249:   33.6     32.3       0.0           0
250:   29.3     28.3       0.0           0
251:   26.0     25.3       0.0           0
252:   31.9     30.7       0.0           0
write.csv(bodyfatClean, "./bodyfatClean.csv",
          row.names = FALSE)

This material is released under an Attribution-NonCommercial-ShareAlike 3.0 United States license. Original author: Alan T. Arnholt


References

Dowle, Matt, and Arun Srinivasan. 2023. Data.table: Extension of ‘Data.frame‘. https://CRAN.R-project.org/package=data.table.
Johnson, Roger W. 1996. “Fitting Percentage of Body Fat to Simple Body Measurements.” Journal of Statistics Education 4 (1). https://doi.org/10.1080/10691898.1996.11910505.
Matloff, Norman. 2017. Statistical Regression and Classification: From Linear Models to Machine Learning. 1 edition. Boca Raton: Chapman; Hall/CRC.
Sievert, Carson, Chris Parmer, Toby Hocking, Scott Chamberlain, Karthik Ram, Marianne Corvellec, and Pedro Despouy. 2022. Plotly: Create Interactive Web Graphics via Plotly.js. https://CRAN.R-project.org/package=plotly.
Wickham, Hadley, Winston Chang, Lionel Henry, Thomas Lin Pedersen, Kohske Takahashi, Claus Wilke, Kara Woo, Hiroaki Yutani, and Dewey Dunnington. 2023. Ggplot2: Create Elegant Data Visualisations Using the Grammar of Graphics. https://CRAN.R-project.org/package=ggplot2.
Wickham, Hadley, Romain François, Lionel Henry, Kirill Müller, and Davis Vaughan. 2023. Dplyr: A Grammar of Data Manipulation. https://CRAN.R-project.org/package=dplyr.